-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-10633. Support Content-MD5 header for checking object integrity when uploading object #9612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note:
|
|
Thanks @jojochuang for pointing this out. I agree that the final ETag may not always be an MD5 (especially for multipart or encrypted objects). However, this patch focuses on in-transit integrity checking via the During the upload, S3G calculates the MD5 of the incoming data stream to verify it against the client-provided |
# Conflicts: # hadoop-ozone/integration-test-s3/src/test/java/org/apache/hadoop/ozone/s3/awssdk/v2/AbstractS3SDKV2Tests.java # hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
ivandika3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hevinhsu for the patch and the extensive tests. Overall LGTM. Just left a few comments.
| String eTag = DatatypeConverter.printHexBinary(digest).toLowerCase(); | ||
| String clientContentMD5 = getHeaders().getHeaderString(S3Consts.CHECKSUM_HEADER); | ||
| if (clientContentMD5 != null) { | ||
| CheckedRunnable<IOException> checkContentMD5Hook = () -> { | ||
| S3Utils.validateContentMD5(clientContentMD5, eTag, key); | ||
| }; | ||
| ozoneOutputStream.getKeyOutputStream().setPreCommits(Collections.singletonList(checkContentMD5Hook)); | ||
| } | ||
| ozoneOutputStream.getMetadata().put(OzoneConsts.ETAG, eTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since now MD5 hash is used for both ETag and content-md5 verification, let's rename variable eTag to md5Hash. This also applied to other places.
Let's also change getMessageDigestDistance and ETAG_PROVIDER to something like getMD5DigestInstance and MD5_PROVIDER.
| assertEquals("\"37b51d194a7513e45b56f6524f2d51f2\"", getObjectResponse.eTag()); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong omission?
Yes, ETag can technically be any hash that based on the object data so it doesn't need to be MD5. Moreover, object uploaded using multipart upload ETag is a hash of all the ETag of all the parts with the "-<num_parts>" prefix, see If we decide in the future to change ETag to not be md5 hash, then we need to setup a new MD5 |
What changes were proposed in this pull request?
Add support for the
Content-MD5header to verify object integrity during object uploads.Please describe your PR in detail:
Content-MD5header using pre-commit hooks before the key is committed.Content-MD5Content-MD5KeyDataStreamOutputto simplify unit test setup.What is the link to the Apache JIRA?
https://issues.apache.org/jira/browse/HDDS-10633
How was this patch tested?
https://github.com/hevinhsu/ozone/actions/runs/20842334193